Overview

It is difficult to accurately define hit quality as a single value such as play result. It is a good indicator of quality but a good hit can be defined by other metrics such as launch angle, a great field position, etc. and all of these qualities could be summed into a single metric. The idea is that features in the model will be weighted differently based on difficulty, frequency, and importance in scoring. The features themselves were also weighed based on their value. For example, bearing will have an individual weight in the model but also the options with bearing (left field, right field) will have a specific weight. The model is creating a new weighted metric that holistically represents ball quality rather than just the play result outcome.

Ultimately the features for this model were chosen based on a mix of effect on play result and survey information I received from baseball players.

EDA and Weight Assignment

I will be investigating several features and their relationships to each other as well as relationship to type of hit and result of the play. Features that influence quality of a hit and each other should be included in the model. Overall, the quality of a hit depends on if the ball makes if out of the infield and can result in a single, double, triple or HR. The features I will be exploring are bearing, pitch type, exit speed, distance, launch angle and hit type.

Bearing- Field Position

Field position plays a role in the likelihood of a play result. A play hit to left field is more likely to result in a single, double and a home run than in right field. Hits to left field are also less likely to receive an out than one hit to right field. There is not a significant difference between the other play results and field position. In summary, negative bearing should hold a higher weight than positive bearing within the model.

Pitch Type

What makes a pitch difficult? Curveballs and sliders are notably the hardest pitches to hit because of the movement and ability to put some speed on the pitch. Using this data, I will first evaluate difficulty based on the frequency of pitch. Pitchers tend to throw what is the most effective. I will create inital weights using a weighted average by pitch type with adjustments due to baseball knowledge that breaking balls (curveballs and sliders) are some of the most difficult pitches to hit.

  • Click and drag a specific play result to view in larger details.

Four seams, or fastballs are the most frequent type of pitch thrown and are most likely to result in a foul ball or an out. Below are the percent of total weights for each pitch type.

Distance

Quality contact means that the ball is able to escape the infield. A ball that reaches the outfield is less likely to result in an out and more likely to result in a single, double or triple and of course a home run.

Exit Speed

Exit speed is extremely important because it affects how far the ball is able to travel. It is important that the ball travels out of the infield because it is less likely to result in an out. There is a positive, linear correlation between exit speed and distance.

Since distance and exit speed are positively correlated, and higher distance means that the hit is more likely to result in an effective play such as an HR, single, double or triple. High exit speed is also related to these play results.

## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2599 rows containing non-finite outside the scale range
## (`stat_smooth()`).

Hit Type

## Warning: Removed 779 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

Model: - Exit speed - Pitch Type - Bearing - Hit Type - Launch Angle - Distance - Outcome of the hit (play result)

Conclusion of Weights & Next Steps

After weights are applied to all of the features, the coefficients would be created to calculate the weighted metric.

Exit Speed, Distance, Bearing and Angle will be weighted equally since they are related and dependent on each other. (Example: Within these: - Exit Speed (10%): Higher exit speed holds higher weight than lower exit speed - Distance (10%): Outfield-qualifying distance is weighed more than infield-qualifying distance - Launch Angle (10%): Upward launch angle is weighed more than an initial downward launch angle. - Bearing (10%): Left field would be weighed more than right field.

Pitch type (15%) and hit type (15%) would be weighted equally since they have significant impact on play result. Pitch type and hit type are also related. - Pitch types and hit types would be weighted based on a weighted average (calculated by frequency) and then manual adjustments based on common baseball knowledge about difficulty to hit and hit type on play result.

Play result (20%) would be weighed the most because scoring is important to winning games and the better the outcome of a hit, the greater the likelihood to score a run. Play result weights would be ranked based on ordinal values (home run is worth more than an out).